Identifying the same individual across different scenes is an important yetdifficult task in intelligent video surveillance. Its main difficulty lies inhow to preserve similarity of the same person against large appearance andstructure variation while discriminating different individuals. In this paper,we present a scalable distance driven feature learning framework based on thedeep neural network for person re-identification, and demonstrate itseffectiveness to handle the existing challenges. Specifically, given thetraining images with the class labels (person IDs), we first produce a largenumber of triplet units, each of which contains three images, i.e. one personwith a matched reference and a mismatched reference. Treating the units as theinput, we build the convolutional neural network to generate the layeredrepresentations, and follow with the $L2$ distance metric. By means ofparameter optimization, our framework tends to maximize the relative distancebetween the matched pair and the mismatched pair for each triplet unit.Moreover, a nontrivial issue arising with the framework is that the tripletorganization cubically enlarges the number of training triplets, as one imagecan be involved into several triplet units. To overcome this problem, wedevelop an effective triplet generation scheme and an optimized gradientdescent algorithm, making the computational load mainly depends on the numberof original images instead of the number of triplets. On several challengingdatabases, our approach achieves very promising results and outperforms otherstate-of-the-art approaches.
展开▼